docs: add async engine dev note by andreatgretel · Pull Request #490 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-04-02T15:48:12Z

📋 Summary

Add "Async All the Way Down" dev note covering the async task-queue scheduler and its impact on Data Designer pipeline performance. Covers the full async engine arc (PRs #356, #378, #404, #429, #456) in a single narrative post with benchmark results and original diagrams.

🔄 Changes

✨ Added

docs/devnotes/posts/async-engine.md - dev note post (~1600 words, slop-guard 93/100)
docs/devnotes/posts/assets/async-engine/ - 6 figures (NVIDIA-styled, dark background + green accent):
- AI-generated hero image
- Sync vs async Gantt timeline (values derived from real trace data)
- DAG shape illustrations (4 benchmark workloads)
- Grouped bar chart (sync vs async wall clock times)
- Speedup scaling chart
- Architecture layers SVG diagram

🔧 Changed

docs/devnotes/.authors.yml - added amanoel author entry
mkdocs.yml - added nav entry (most-recent-first position)

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

async-engine.md - technical claims were cross-checked against implementation code (Kahn's algorithm, AIMD, symmetric bridging, semaphores, etc.) and benchmark scripts (DAG shapes, column dependencies). The "At higher record counts" section discusses rate-limiting tradeoffs qualitatively.
Benchmark data is from 10-record runs. Supporting 20-record and 50-record data exist in tmp_blog_content/ (not committed) for reference.

🤖 Generated with AI

- Fix wall-clock claim: 41% -> 22% to match benchmark table - Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64) - Fix run_config API: use dd.set_run_config() instead of passing to create()

Add "Async All the Way Down" dev note covering the async task-queue scheduler built across PRs #356, #378, #404, #429, #456. Includes benchmark results, architecture diagrams, and DAG shape illustrations.

Build MkDocs site on PRs that touch docs and deploy to Cloudflare Pages. Each PR gets a browseable preview URL posted as a comment. Notebook tutorials use placeholder stubs since they require API keys to execute. Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID repo secrets.

greptile-apps · 2026-04-02T15:51:51Z

Greptile Summary

This PR adds the "Async All the Way Down" dev note documenting the async task-queue scheduler and its performance impact, along with companion assets, an author entry, and an updated nav entry in mkdocs.yml. Previously identified issues (benchmark percentage mismatch, speedup rounding, and run_config API misuse) were resolved in commit 182819f. Technical claims cross-check cleanly: RunConfig.async_trace and RunConfig.progress_bar exist, both DATA_DESIGNER_ASYNC_ENGINE and DATA_DESIGNER_ASYNC_TRACE are live env vars, and the set_run_config() call in the "Try It" snippet matches the actual API.

Confidence Score: 5/5

Documentation-only PR; all previously flagged issues resolved, technical claims verified — safe to merge.

All P0/P1 findings from the prior review round (benchmark percentage, speedup rounding, run_config API) were fixed in 182819f. The remaining content was cross-checked: RunConfig.async_trace and progress_bar exist, DATA_DESIGNER_ASYNC_ENGINE and DATA_DESIGNER_ASYNC_TRACE are wired in the engine, set_run_config() usage is correct, and cross-links between articles resolve. No new issues found.

No files require special attention.

Vulnerabilities

No security concerns identified. This PR is documentation-only with no code or configuration changes that affect runtime behavior.

Important Files Changed

Filename	Overview
docs/devnotes/posts/async-all-the-way-down.md	New dev note covering the async engine; technical claims verified against the codebase — API usage, env vars, and benchmark numbers are all accurate.
docs/devnotes/posts/owning-the-model-stack.md	Cross-reference link to the new async-all-the-way-down.md post added; no other content changes.
docs/devnotes/.authors.yml	New `amanoel` author entry added, matching the author slug used in the new post's front matter.
mkdocs.yml	New "Async All the Way Down" nav entry inserted at the top of the Dev Notes section (most-recent-first ordering).
.github/workflows/docs-preview.yml	Docs preview CI workflow; no changes to build logic visible in this PR diff.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cell enters Frontier\nwhen upstream deps satisfied] --> B[AsyncTaskScheduler\nacquires submission semaphore slot]
    B --> C{LLM-bound\ntask?}
    C -- Yes --> D[Acquire LLM-wait semaphore\nRelease submission semaphore]
    C -- No --> E[Hold submission slot\nfor full duration]
    D --> F[Generator makes\nLLM request via ThrottledModelClient]
    E --> G[Generator runs\nCPU/non-LLM work]
    F --> H{Provider\nresponse?}
    G --> I[Release submission slot\nMark cell complete]
    H -- 429 --> J[AIMD: cut concurrency\nDefer task to frontier]
    H -- Success --> K[Release LLM-wait slot\nMark cell complete]
    J --> A
    K --> L[CompletionTracker: unlock\ndownstream cells]
    I --> L
    L --> A
    K --> M[Row group complete?\nFlush to Parquet]

_{Reviews (14): Last reviewed commit: "docs: address review feedback on async b..." | Re-trigger Greptile}

docs/devnotes/posts/async-engine.md

github-actions · 2026-04-02T17:19:14Z

Docs preview: https://a54387d5.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

docs/devnotes/posts/async-engine.md

Add DAG subtitle to sync-vs-async timeline figure and bridge the surrounding text to explain which workload shape is being shown.

docs/devnotes/posts/async-engine.md

Regenerate scale-model-timeline and scale-boxplot from nginx access logs (column_progress.csv, sync/summary.json) instead of buffered execution logs. Optimize both PNGs to palette mode. Adjust figure widths and update model timeline commentary.

johnnygreco · 2026-04-08T16:12:37Z

docs/devnotes/posts/async-all-the-way-down.md

+
+# **Async All the Way Down**
+
+Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. The previous engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own.


Suggested change

Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. The previous engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own.

Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. Data Designer’s original workflow engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own.

Thanks, adopted your wording with a small tweak - added "within each batch" to clarify the sync engine already split into batches, it just ran columns sequentially within each one.

docs/devnotes/posts/async-all-the-way-down.md

johnnygreco · 2026-04-08T16:46:30Z

docs/devnotes/posts/async-all-the-way-down.md

+
+The scheduler maintains a *frontier* — the set of tasks whose inputs are all satisfied. Dispatch is a loop: pull ready tasks from the frontier, acquire a [semaphore](https://en.wikipedia.org/wiki/Semaphore_(programming)) slot, spawn a worker. When the worker completes, mark the cell done, which may add new tasks to the frontier. The loop runs until every cell in every row group has completed or been dropped.
+
+Two details matter here. Multi-column generators (where one generator produces several output columns) are deduplicated so they run once. And stateful generators like seed dataset readers get per-instance `asyncio.Lock`s to preserve row-group ordering, since the order rows are read from a seed dataset matters.


nit: i feel like this bit about multi-column generators and seed readers might be TMI. I get that we want to give technical details here, but the goal is for users to not need to worry about these deep implementation details. It's also a bit confusing because the reader would need to understand how generators relate to columns and why we have multi-column generators to begin with.

Good call, removed the paragraph. The two-semaphore discussion below is the interesting detail worth keeping.

- Tighten intro to a concise abstract, move pipeline narrative into "The Bottleneck Was Structural" section - Remove multi-column generators / seed readers paragraph (TMI) - Clarify sync engine ran columns sequentially within each batch

andreatgretel added 3 commits April 2, 2026 07:09

fix: address review feedback on async engine dev note

182819f

- Fix wall-clock claim: 41% -> 22% to match benchmark table - Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64) - Fix run_config API: use dd.set_run_config() instead of passing to create()

docs: add async engine dev note

6216dd9

Add "Async All the Way Down" dev note covering the async task-queue scheduler built across PRs #356, #378, #404, #429, #456. Includes benchmark results, architecture diagrams, and DAG shape illustrations.

andreatgretel requested a review from a team as a code owner April 2, 2026 15:48

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

docs/devnotes/posts/async-engine.md Outdated Show resolved Hide resolved

docs/devnotes/posts/async-engine.md Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

docs/devnotes/posts/async-engine.md Outdated Show resolved Hide resolved

andreatgretel force-pushed the andreatgretel/docs/async-blog branch from 7055573 to e434aad Compare April 2, 2026 18:01

andreatgretel and others added 3 commits April 2, 2026 15:14

Merge branch 'main' into andreatgretel/docs/async-blog

8eb0133

fix: update speedup chart alt text from 1.7x to 1.6x

29f5617

docs: improve timeline figure context and labeling

8d688a6

Add DAG subtitle to sync-vs-async timeline figure and bridge the surrounding text to explain which workload shape is being shown.

nabinchha reviewed Apr 2, 2026

View reviewed changes

docs/devnotes/posts/async-engine.md Outdated Show resolved Hide resolved

nabinchha and others added 5 commits April 2, 2026 18:39

edits+additions to async-all-the-way-down dev notes

5d3d74c

clarify two semaphore dance

3a224b8

remove dead link

0c38850

replace hero image

f2f6306

johnnygreco reviewed Apr 8, 2026

View reviewed changes

Merge branch 'main' into andreatgretel/docs/async-blog

c92d623

johnnygreco reviewed Apr 8, 2026

View reviewed changes

docs/devnotes/posts/async-all-the-way-down.md Show resolved Hide resolved

add link from owning-the-model-stack to async-dev-node

61c4ef7

johnnygreco reviewed Apr 8, 2026

View reviewed changes

docs: address review feedback on async blog post

426f3fe

- Tighten intro to a concise abstract, move pipeline narrative into "The Bottleneck Was Structural" section - Remove multi-column generators / seed readers paragraph (TMI) - Clarify sync engine ran columns sequentially within each batch

nabinchha approved these changes Apr 8, 2026

View reviewed changes

johnnygreco approved these changes Apr 8, 2026

View reviewed changes

andreatgretel merged commit 0e90ea6 into main Apr 8, 2026
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add async engine dev note#490

docs: add async engine dev note#490
andreatgretel merged 14 commits intomainfrom
andreatgretel/docs/async-blog

andreatgretel commented Apr 2, 2026

Uh oh!

greptile-apps bot commented Apr 2, 2026 •

edited

Loading

Confidence Score: 5/5

Vulnerabilities

Flowchart

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

johnnygreco Apr 8, 2026

Uh oh!

andreatgretel Apr 8, 2026

Uh oh!

Uh oh!

johnnygreco Apr 8, 2026 •

edited

Loading

Uh oh!

andreatgretel Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		# Async All the Way Down

		Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. The previous engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until every row of `summary` had finished, even though it only needed its own.


		The scheduler maintains a frontier — the set of tasks whose inputs are all satisfied. Dispatch is a loop: pull ready tasks from the frontier, acquire a [semaphore](https://en.wikipedia.org/wiki/Semaphore_(programming)) slot, spawn a worker. When the worker completes, mark the cell done, which may add new tasks to the frontier. The loop runs until every cell in every row group has completed or been dropped.

		Two details matter here. Multi-column generators (where one generator produces several output columns) are deduplicated so they run once. And stateful generators like seed dataset readers get per-instance `asyncio.Lock`s to preserve row-group ordering, since the order rows are read from a seed dataset matters.

Conversation

andreatgretel commented Apr 2, 2026

📋 Summary

🔄 Changes

✨ Added

🔧 Changed

🔍 Attention Areas

Uh oh!

greptile-apps bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnnygreco Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

andreatgretel Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnnygreco Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreatgretel Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Apr 2, 2026 •

edited

Loading

github-actions bot commented Apr 2, 2026 •

edited

Loading

johnnygreco Apr 8, 2026 •

edited

Loading